Multimodality – technology, visions and demands for the future

نویسندگان

Kristiina Jokinen

Antti Raike

چکیده

In this paper we will discuss currently available multimodal technologies and how research on the multimodal interfaces can be used to improve human-computer interaction (HCI). We will focus on mechanisms that are used to make communication more natural and accessible for all, and on conceptual barriers that limit the use of multimodality in interface design. We will present CinemaSense as an example of how Design for all -principles can be taken into account when building a web-based portal for distance learning. 1 Metaphors for human-computer interaction The computer is usually regarded as a tool which supports human goals: its role is to be a passive and transparent ‘slave’ under human control. However, in recent years, another metaphor has become available: computer as an ‘agent’. This is not only because of soccer-playing robots, Sony AIBO, and Honda’s humanoid ASIMO, examples of advanced computer agents which are capable of moving, sensing their environment, and performing certain tasks, often interacting with humans via spoken natural language commands. Rather, the concept of the computer has changed: it is not only a tool that is used to perform certain tasks, but a participant which the user interacts with and communicates her needs to. For instance, as the users need to work with different applications (graphics, text, communication protocols), all of which usually run in separate application environments, software agents can be attached to various applications so as to act as the representative of the user’s goals in the given environment. They can thus facilitate the user’s working in the complexity of different applications, making it easier and more enjoyable. On the other hand, the growth of information in the Internet, electronic databases, digital reporting, eCommerce, eLearning etc. has also contributed to the new design concept: an interface agent would understand both natural language requests and the logic behind the service, and help the user to find interesting and useful information from the huge and complex knowledge bases. It would coordinate multimodal input and produce a multimodal response, thus making the interaction between human users and information services more natural and more accessible. Finally, technological development allows us to build multimodal systems that can interact with humans using various types of input and output modalities: extensions to the traditional text and graphical user 1 The author’s main affiliation is currently at the University of Helsinki, email: [email protected] interface (GUI) include gestures, speech, tactile and haptic interfaces, as well as interfaces based on eye-tracking. With the help of the wide variety of multimodal interface technology, the agent metaphor can be realised as a conversational interface agent which supports flexibility and mediates interaction between the user and digital information. For instance, in the SmartKom project (Wahlster et al. 2001), the metaphor is realised in the Smartakus –interface agent which takes care of the user’s requests with respect to different applications and application scenarios. Interface agents are sometimes also called Embodied Conversational Agents (Cassell et al. 2000), because of their more natural and human-like communicative capabilities: they listen, speak, gesture, etc., like humans. However, models for multimodal communication are still a research issue, and we lack theories for multimodal communication in general. More research is needed to asses the usefulness and limits of the ‘computer as an agent’metaphor. What are the most natural and intuitive interaction modes in human-computer interaction? Open issues deal with the design and use of multimodal interfaces: how do human speech and gestures, gaze and facial expressions contribute to communication in general, how to determine which modalities are appropriate for particular tasks, how to combine modalities into a clear, easy-to-use interface, how to address the criteria for usability and accessibility, how to build interfaces that are affective, are there aesthetic and artistic aspects in interface design? In this paper we will discuss currently available multimodal technologies and how research on the multimodal interfaces can be used to improve humancomputer interaction (HCI). Especially, we will focus on interaction mechanisms that are used to make communication more natural and accessible for all, and on conceptual barriers that limit the use of multimodality in interface design. As a case study we will present CinemaSense which was developed by the second author as a portal for film making in his study of the use of collaborative distance learning for the development of the deaf students’ understanding of film making concepts. The paper is organised as follows. We will first give definitions for multimodality and multimodal systems in Section 2. In Section 3, we describe the case study of CinemaSense, and in Section 4, we discuss reasons for using multimodality in interface design. Finally, we provide challenges and visions for future research in Section 5, and conclusions in Section 6. 2 Multimodality and multimodal systems 2.1 Some Basic Definitions When considering multimodal human-computer interaction, it is useful to think what multimodality means in human communication: it refers to the use of a variety of sensory input/output channels which allow sensory data to be received and transformed to higher-level representations, and through which manipulation of the environment can take place. Human cognition assumes that a combination of multimodal information is available, and a constant stream of visual, auditory, olfactory and tactile sensations are processed in the brain where the data generates cognitive and emotional states. The cognitive state then functions as a starting point for executing an action through the motor control system to control and coordinate the information flow. In fact, multimodality seems so natural to human communication that we do not pay attention to the whole range of modalities used in our every-day interactions: we perform speaking, writing, signing, lip-reading, drawing, gesturing, touching, listening, and watching with ease, and hence have developed lots of tacit knowledge about multimodal communication. Analogously to human-human interaction, multimodality can also in humancomputer interaction be seen as the use of different input/output channels. There is a need for terminological clarification though, and we adopt the following definitions of Maybury and Wahlster (1998): • Medium = material object used for presenting or saving information, physical carriers including computer input/output devices (sounds, movements, screen, microphone, speaker, pointer) • Code = system of symbols used for communication (natural languages, gesture languages) • Mode (modality) = human mechanism of perception, senses employed to process incoming information (vision, audition, olfaction, touch, taste) The ISLE/NIMM standardization group for natural multimodal interaction (Dybkjaer et al., 2002) uses a two-way definition where the coding of information and the senses employed to process the information are conflated. According to this, medium is, as above, the physical channel for information encoding (visual, audio, gestures), while modality is the particular way of encoding information in some medium. The same modality can be represented in different media, and e.g. spoken language is a modality expressed in the acoustic medium, whereas written language is a modality expressed in the medium of light/graphics. Spoken language may also be expressed in the medium of light/graphics as lip movements or textual transcription. However, we believe it is useful to distinguish the abstract system of symbols from the actual encoding of the symbols, and, following Maybury and Wahlster (1998), to say that the natural language code can use speech or written text as media, but rely on visual or auditory modalities (and associated media such as keyboard and microphone) when being processed. 2.2 Multimodal systems and interfaces The EAGLES expert and advisory group (Gibbon et al, 2000) defines different interaction systems as follows: • Multimedia systems offer more than one device for the user to input to the system and for the system to give feedback to the user (e.g. microphone, speaker, keyboard, mouse, touch screen, camera), but they do not generate abstract concepts automatically, nor transform information into higherlevel representations. • Multimodal systems are systems that represent and manipulate information from different human communication channels at multiple levels of abstraction. Multimodal systems thus emphasize abstract levels of processing, explicit representations of the dialogue context and the user, and investigations of the user’s beliefs, intentions, attitudes, capabilities, and preferences. Typically they include components that take care of media and mode analysis and design, interaction and context management, user modelling and knowledge sources, whereas multimedia systems refer to various types of speech, graphical and direct manipulative interfaces, which usually do not have such modules. While the distinction between multimodal systems and interface agents does not seem to be appropriate anymore -due to the new agent metaphor-, it is still good to keep in mind that the research focus in these traditions has been different, and mainly concerned whether to use natural language in interaction or not. It explains current integration and possibilities that multimodality opens up to the development of interactive systems and designing of new kinds of interfaces. EAGLES, for instance, further distinguishes multimodal audio-visual speech systems that utilize the same input/output channels as humans by integrating non-verbal cues (facial expressions, eye-gaze, lip-movements) with speech recognition and synthesis. One of the first multimodal systems was Bolt’s Put that there -system (Bolt 1980), where the users interacted with the world through its projection on the wall and using speech and pointing gestures. The main research goal was to study how actions can disambiguate actions in another modality. Since then, several systems has been built, see references e.g. in Maybury (1993), Wahlster (2001), Dybkjaer et al. (2002), Granström (2002). The term Perceptual User Interfaces (PUI) was introduced by Turk & Robertson (2000) to cover interfaces that combine natural human capabilities (communication, motor, cognitive, perceptual skills) with computer I/O devices. A new type of multimodal interfaces can be found e.g in. QuiQui’s Giant Bounce, a computer game for 4 to 9 year old children (Hämäläinen et al., 2001). It is controlled via body movement and voice, thus extending natural interaction from traditional keyboards and graphics to a web camera and microphone. Eye-gaze is also used in the new types of interfaces (Majaranta & Räihä, 2002). For severely handicapped people, eye-typing may be the only way to interact. Milekic (2002) argues in favour of tangible interaction where tangible includes all sensory modalities and is not reduced just to those related to the sense of touch (haptic, cutaneous, tactile). He focuses especially on multimodal output: abstract data manipulations should be tangible and accessible to humans also in that the feedback about successfully performed tasks could be given by physical, not only by visual (graphical changes) or auditory (click) sensory information. As examples of this trend he mentions various computer game interfaces, the Logitech iFeel TM mouse which allows the users to “feel” different objects, and affective computing where the affective state of a human user becomes accessible to a computer, or to another, remote user.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Subsampling Method for 3D Multimodality Medical Image Registration Based on Mutual Information

Mutual information (MI) is a widely used similarity metric for multimodality image registration. However, it involves an extremely high computational time especially when it is applied to volume images. Moreover, its robustness is affected by existence of local maxima. The multi-resolution pyramid approaches have been proposed to speed up the registration process and increase the accuracy of th...

متن کامل

Technological Vision in Automotive Industry and Presenting a Model For the Iranian Automotive Industry

This article aims at strategic vision to technology and suggests a strategic planning for this purpose. The main emphasis in this article is on strategic report compilation in the framework of strategic vision and covers issues such as identification of strategic planning dimensions and strategic vision levels, technology priority setting, environment monitoring, focus on costumer needs, method...

متن کامل

Future visions deduced research needs in electronics packaging and production

The purpose of his paper is to identify and briefly discuss some important areas where (additional) research is needed to be able to within 5-20 years realize the four “example-visions” presented in www.netpack-europe.org, Visions Lab: Augmented Reality, Personal Liberator, Clever Paper and Healthy Freedom. In total these visions embrace very diverse technology areas, many of those extending ou...

متن کامل

Visions and Visioning in CHI

There are many visions that touch on the future of human computer interaction from a trans-human future to a post-technological UI. However visions related to the progress of technology are not new. Creative and insightful visionaries from Denis Diderot to Vannevar Bush have been postulating visions of possible futures or technology for centuries. Some idealised views end up discredited with ad...

متن کامل

Sociology of Low Expectations: Recalibration as Innovation Work in Biomedicine.

Social scientists have drawn attention to the role of hype and optimistic visions of the future in providing momentum to biomedical innovation projects by encouraging innovation alliances. In this article, we show how less optimistic, uncertain, and modest visions of the future can also provide innovation projects with momentum. Scholars have highlighted the need for clinicians to carefully man...

متن کامل

Climate Virtues Ethics: A Proposal for Future Research

Climate virtue ethics points to the subjective/personal dimensions of climate ethics, which have been largely neglected by previous research. There is a lot of research from diverse fields that pertains to the cultural and the individual dimensions that come along with climate virtue ethics, but, as of yet, these dimensions have hardly been examined together. Future research on climate virtue e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Multimodality – technology, visions and demands for the future

نویسندگان

چکیده

منابع مشابه

A Novel Subsampling Method for 3D Multimodality Medical Image Registration Based on Mutual Information

Technological Vision in Automotive Industry and Presenting a Model For the Iranian Automotive Industry

Future visions deduced research needs in electronics packaging and production

Visions and Visioning in CHI

Sociology of Low Expectations: Recalibration as Innovation Work in Biomedicine.

Climate Virtues Ethics: A Proposal for Future Research

عنوان ژورنال:

اشتراک گذاری